INTRODUCTION

 

The diverse ethnic landscape of the United Kingdom prompts a critical examination of interactions with local law enforcement. This study seeks to uncover any potential disparities in policing practices within this cosmopolitan society.

 

This research endeavours to reveal biases,if any, within varied ethnic encounters with law enforcement by:
- Analysing ethnic diversity in police force and its impact on stops
- Investigating biases in associating searched objects with suspect ethnicity
- Assessing disparities in post-stop-and-search treatments

   

DATA

 

About Datset:

The study is centered on the 2021 stop-and-search police dataset, investigating interactions among various ethnicities during that year. Additionally, it leverages the concurrent release of the 2021 census data to enhance demographic insights without adjusting for population changes,as the study is confined to this single year’s datasets.Other datasets used in the study are referenced where they contribute to the analysis.

 

Data Processing:

1. To align police stop-and-search data, gathered by territorial force areas, with census data collected at the local authorities’ level, aggregation was needed to match local authorities with their corresponding territorial force areas.
2. The Census data, with 19 sub-ethnicities, was grouped into broader categories: Asian, Black, Mixed, White, and Other, with the primary analysis concentrating on these major ethnic groups
3. Similarly,police records categorised officers’ ethnicities into major ethnicities, whereas suspects’ sub-ethnicities were recorded. The latter were aggregated into major ethnicities for facilitating comparison and analysis.
4. Though the study glimpses into ethnic representation at police jurisdiction levels, it transitions from a focus on sub-areas to a national-level analysis. It aimed to overcome limitations posed by varied, small-scale sub-ethnicities across regions. Limited statistical significance and variation among these subsets impeded a detailed analysis, while diverse population sizes within regions added complexity to understanding these sub-ethnicities comprehensively.
5. The British Transport Police (focus on railway policing) was omitted from analysis to maintain a focus on police jurisdictions under local authorities’ purview.
6. The stop-and-search dataset includes individuals who self-reported their ethnicity as “Unknown.” These instances were excluded from the analysis to align with population-relative datasets, as census records lack estimates for such categories.
7. RSQLite was utilised to organise and store the extensive dataset (police stop-search data). Each analysis on this dataset was conducted using targeted SQL queries.
8. The confidence in local police dataset is derived from survey data, not reflective of the entire populace.

   

ANALYSIS

 

1. Background

 

1.1. To gauge the cosmopolitanism of the UK population, these maps illustrate the global origins of residents in the country and their distribution relative to the total population across Northern Ireland, Wales, Scotland, and England.

 

   

1.2. The map exhibits police force area boundaries in England and Wales

 

   

1.3. The map shows ethnicities as a proportion of the total population in each police force area.

 

   

2. Findings

   

2.1. Police Force Analysis

A diverse police force is crucial for building trust, community relations, and operational effectiveness. Reflecting community diversity fosters understanding, fair treatment, and effective crime resolution while reducing biases and improving communication for equitable policing.

 

2.1.1. Police Force Diversity
The plot below depicts ethnic diversity among officers relative to the total workforce in each police force:

 

2.1.2. Police Force Representation: National Level

 

 

The resulting ratios indicate the proportional representation of different ethnicities within law enforcement relative to their respective population sizes. For instance, a ratio of 2.57 White officers per 1,000 White individuals suggests a disproportionately higher representation, while significantly lower ratios for Black and Other officers indicate potential underrepresentation. Such disparities can profoundly impact societal dynamics in a diverse population. A predominantly White police force may contribute to perceptions of systemic bias and institutional inequity, potentially eroding trust and legitimacy among ethnic minority communities. This underrepresentation can exacerbate social tensions and hinder effective community policing. Conversely, achieving proportional representation in law enforcement is essential for fostering trust, ensuring equitable treatment, and promoting social cohesion within a multicultural society

 

2.2. Stop-Search Analysis

 

2.2.1. General Overview
The map displays the percentage of individuals from each ethnicity stopped relative to their respective local populations in each police force area.

 

 

The following illustrations delineate the total 2021 cases for various ethnicities and the gender distribution within each ethnicity:

 

   

2.2.2. Analysing Ethnic Diversity and its Impact on Stops

 

2.2.2.1. Incidences per Ethnicity

 

 

The data reveals a markedly higher incidence of stop-and-search practices targeting individuals from Black and Other ethnicities compared to those from White, Mixed, and Asian backgrounds. This stark disparity can be attributed to several factors. Firstly, implicit bias plays a significant role, as officers may unconsciously associate certain ethnicities with higher criminal activity, leading to disproportionate targeting. Historical prejudices and long-standing societal stereotypes further exacerbate this issue, influencing policing practices and perpetuating biased behaviors. Finally, institutional practices within law enforcement agencies may inadvertently prioritize certain areas or behaviors that disproportionately affect specific ethnic groups. These factors collectively contribute to the observed disparities in stop-and-search incidents, highlighting the complex interplay between societal biases and institutional policies.

 

2.2.2.2. Ethnicity Proportion in Suspects Stopped by Officers

 

 

Officers often tend to stop suspects from their own ethnic background, but notably, ‘Mixed’ officers also frequently stop individuals from ‘Other’ ethnicities. This could be attributed to officers’ comfort levels in interacting with individuals who share similar cultural backgrounds. This familiarity might facilitate smoother communication and rapport, potentially improving the quality of interactions during stop-and-search incidents. However, this practice raises concerns about biases and fairness in policing, as it may inadvertently perpetuate unequal treatment based on ethnicity.
Furthermore, there is a noticeable trend where officers might preferentially engage with individuals of their own ethnicity to avoid perceptions of racial profiling. This cautious approach aims to mitigate accusations of bias in policing practices. Consequently, it may lead to a higher proportion of intra-ethnic stops, potentially affecting the overall equity and fairness of law enforcement interactions. This phenomenon underscores the complex interplay between officers’ decision-making processes, public perception, and the need to uphold impartiality in policing.
Moreover, the significant number of ‘Mixed’ officers stopping individuals from ‘Other’ ethnicities could reflect their unique positionality and experiences within the police force. These officers navigate multiple cultural contexts, which may influence their policing strategies and interactions. This intersectional perspective adds layers of complexity to how policing practices are carried out and perceived, highlighting the dynamic nature of officer behavior and its impact on diverse communities.

 

2.2.3. Investigating Biases in Associating Searched Objects with Suspect Ethnicity

 

It examines if specific ethnicities are disproportionately linked to items such as controlled drugs, offensive weapons, psychoactive substances, and wildlife offences, chosen for their law enforcement relevance.

 

 

The disproportionate representation of the ‘Other’ ethnic group across all search objects, alongside lower representation of White individuals in most search categories, suggests a complex interplay of societal perceptions and policing practices. This disparity may reflect underlying biases and stereotypes about crime and ethnicity within law enforcement. Officers’ decisions during stop-and-search incidents could be influenced by implicit biases, where certain ethnic groups are more likely to be perceived as suspicious or involved in criminal activity based on prevailing social stereotypes. Moreover, disparities in search outcomes for ‘Other’ and ‘Black’ ethnicities in specific categories may indicate differential treatment in policing, highlighting potential inequities in how laws are enforced across different ethnic communities.

 

2.2.4. Assessing Disparities in Post-Stop-and-Search Treatments

 

Grouping outcomes into categories like “No action,” “Arrest,” and “Warning” captures diverse responses post-stop and search. This enables examination of how different ethnic groups are treated following these encounters for potential disparities.

 

 

The disparities in post-stop outcomes, where ‘Other’ and Black ethnicities experience higher arrest and warning rates compared to White individuals despite similar initial encounters with law enforcement, reflect complex societal and institutional dynamics. These disparities often stem from several intertwined factors. Firstly, implicit biases within policing can lead officers to perceive individuals from ‘Other’ and Black ethnicities as more suspicious or threatening, influencing their decisions to escalate encounters towards arrest or warning. Such biases may be rooted in societal stereotypes and historical prejudices, which shape perceptions of crime and safety based on race or ethnicity.Secondly, systemic factors within law enforcement institutions, such as policies on crime prevention and community safety, may inadvertently target or disproportionately affect certain ethnic groups. This can result in heightened surveillance, more frequent stops, and increased likelihood of punitive outcomes for individuals from these communities.
The consequences of these disparities are multifaceted. They can contribute to a sense of injustice and alienation within affected communities, eroding trust in law enforcement and undermining public cooperation with police efforts. Moreover, persistent disparities in post-stop actions can perpetuate cycles of inequality and disadvantage, impacting social cohesion and reinforcing divisions along racial and ethnic lines.

 

2.3. Confidence in Local Police

 

 

The shifts in confidence levels towards local police among ethnic groups are influenced by various interconnected factors. Historical experiences of discrimination and unequal treatment within law enforcement contribute significantly to declining trust across communities. Perceptions of biased policing practices, including disparities in stop-and-search incidents and post-stop outcomes, reinforce these sentiments, heightening feelings of mistrust and dissatisfaction. Moreover, broader socio-economic factors such as inequalities in access to resources and opportunities play a crucial role in shaping perceptions of police effectiveness and fairness, impacting overall confidence. Similarly, the variability in confidence within ethnic groups underscores diverse experiences and perceptions of law enforcement interactions, influenced by factors such as the visibility of police presence, experiences of discrimination, and individual encounters with officers.

 

 

Among major ethnic subgroups, the Chinese and ‘Asian Other’ have relatively higher confidence compared to other Asian ethnicities. The Black subgroup ,in general, have shown decreasing confidence with greatest variation in ‘Black Other’. Similarly, White Irish demonstrate higher confidence compared to White British in White subgroup.

   

3. Conclusion and Recommendations

The analysis of the 2021 stop-and-search police dataset reveals profound disparities in interactions between law enforcement and different ethnic groups within the United Kingdom. These findings underscore systemic biases ingrained in policing practices, evident in the disproportionate targeting of Black and Other ethnicities. Implicit associations linking certain ethnicities to heightened criminality perpetuate unequal treatment, posing significant challenges to justice and equity in policing strategies. Moreover, the underrepresentation of diverse ethnic groups within the police force not only amplifies perceptions of institutional bias but also hampers the ability of law enforcement to effectively engage with and serve multicultural communities, thus undermining trust and legitimacy.
To mitigate these challenges and enhance policing practices in a multicultural context, proactive measures must be undertaken. Increasing ethnic diversity within police forces through targeted recruitment strategies and comprehensive training on cultural sensitivity, implicit bias, and community engagement is essential. This diversity of perspectives within law enforcement can improve relations and ensure responsive policing strategies that resonate with diverse community values. Also, regular reviews of stop-and-search policies are necessary to ensure they are evidence-based, transparent, and accountable. Implementing procedural justice principles can enhance fairness, legitimacy, and public trust in law enforcement practices. By fostering a culture of fairness and accountability, policymakers can mitigate disparities and promote equitable treatment under the law. Finally, fostering proactive engagement between law enforcement agencies and multicultural communities through collaborative partnerships, outreach programs, and community policing initiatives is crucial. Building trust and mutual understanding is key to addressing biases, enhancing public safety, and fostering positive relationships that transcend cultural differences.
In conclusion, addressing disparities in policing practices requires a comprehensive approach that integrates policy reforms, community engagement, and institutional accountability. By prioritizing equity and inclusivity in law enforcement strategies, policymakers can advance towards a more just and cohesive society where all individuals are treated with dignity and fairness, regardless of ethnicity or background.

   

APPENDIX

 

Appendix 1: Local Authorities under Police Force Area

Police_Force Areas
Avon and Somerset Bath and North East Somerset,Bristol, City of,North Somerset,South Gloucestershire,Mendip,Sedgemoor,South Somerset,Taunton Deane,West Somerset
Bedfordshire Bedford,Luton,Central Bedfordshire
Cambridgeshire Cambridge,East Cambridgeshire,Fenland,Huntingdonshire,Peterborough,South Cambridgeshire
Cheshire Halton,Warrington,Cheshire East,Cheshire West and Chester
Cleveland Hartlepool,Redcar and Cleveland,Middlesbrough,Stockton-on-Tees
Cumbria Allerdale,Barrow-in-Furness,Carlisle,Copeland,Eden,South Lakeland
Derbyshire Amber Valley,Bolsover,Chesterfield,Derby,Derbyshire Dales,Erewash,High Peak,North East Derbyshire,South Derbyshire
Devon & Cornwall Isles of Scilly,Plymouth,Torbay,Cornwall,East Devon,Mid Devon,North Devon,Torridge,South Hams,Teignbridge,West Devon,Exeter
Dorset Bournemouth,Poole,Christchurch,East Dorset,North Dorset,Purbeck,West Dorset,Weymouth and Portland
Durham Darlington,County Durham
Dyfed-Powys Powys,Ceredigion,Pembrokeshire,Carmarthenshire
Essex Basildon,Braintree,Brentwood,Castle Point,Chelmsford,Colchester,Epping Forest,Harlow,Maldon,Rochford,Southend-on-Sea,Tendring,Thurrock,Uttlesford
Gloucestershire Cheltenham,Cotswold,Forest of Dean,Gloucester,Stroud,Tewkesbury
Greater Manchester Bolton,Bury,Manchester,Oldham,Rochdale,Salford,Stockport,Tameside,Trafford,Wigan
Gwent Caerphilly,Blaenau Gwent,Torfaen,Monmouthshire,Newport
Hampshire East Hampshire,Eastleigh,Fareham,Gosport,Havant,Isle of Wight,New Forest,Portsmouth,Southampton,Test Valley,Winchester,Basingstoke and Deane,Hart,Rushmoor
Hertfordshire Broxbourne,Dacorum,East Hertfordshire,Hertsmere,North Hertfordshire,St Albans,Stevenage,Three Rivers,Watford,Welwyn Hatfield
Humberside East Riding of Yorkshire,Kingston upon Hull, City of,North East Lincolnshire,North Lincolnshire
Kent Ashford,Canterbury,Dover,Maidstone,Medway,Sevenoaks,Folkestone and Hythe,Swale,Thanet,Tonbridge and Malling,Tunbridge Wells,Dartford,Gravesham
Lancashire Blackburn with Darwen,Blackpool,Burnley,Chorley,Fylde,Hyndburn,Lancaster,Pendle,Preston,Ribble Valley,Rossendale,South Ribble,West Lancashire,Wyre
Leicestershire Blaby,Charnwood,Harborough,Hinckley and Bosworth,Leicester,Melton,North West Leicestershire,Oadby and Wigston,Rutland
Lincolnshire Boston,East Lindsey,Lincoln,North Kesteven,South Holland,South Kesteven,West Lindsey
London, City of City of London
Merseyside Knowsley,Liverpool,Sefton,St. Helens,Wirral
Metropolitan Police Barking and Dagenham,Barnet,Bexley,Brent,Bromley,Camden,Westminster,Croydon,Ealing,Enfield,Greenwich,Hackney,Hammersmith and Fulham,Haringey,Harrow,Havering,Hillingdon,Hounslow,Islington,Kensington and Chelsea,Kingston upon Thames,Lambeth,Lewisham,Merton,Newham,Redbridge,Richmond upon Thames,Southwark,Sutton,Tower Hamlets,Waltham Forest,Wandsworth
Norfolk Breckland,Broadland,Great Yarmouth,King’s Lynn and West Norfolk,North Norfolk,Norwich,South Norfolk
North Wales Isle of Anglesey,Gwynedd,Conwy,Denbighshire,Flintshire,Wrexham
North Yorkshire York,Craven,Hambleton,Harrogate,Richmondshire,Ryedale,Scarborough,Selby
Northamptonshire Corby,East Northamptonshire,Kettering,Northampton,Wellingborough,Daventry,South Northamptonshire
Northumbria Gateshead,Newcastle upon Tyne,North Tyneside,South Tyneside,Sunderland,Northumberland
Nottinghamshire Ashfield,Bassetlaw,Mansfield,Newark and Sherwood,Nottingham,Broxtowe,Gedling,Rushcliffe
South Wales Swansea,Neath Port Talbot,Bridgend,Vale of Glamorgan,Cardiff,Rhondda Cynon Taf,Merthyr Tydfil
South Yorkshire Barnsley,Doncaster,Rotherham,Sheffield
Staffordshire Cannock Chase,East Staffordshire,Lichfield,Newcastle-under-Lyme,South Staffordshire,Stafford,Staffordshire Moorlands,Stoke-on-Trent,Tamworth
Suffolk Ipswich,Suffolk Coastal,Waveney,Babergh,Forest Heath,Mid Suffolk,St Edmundsbury
Surrey Elmbridge,Epsom and Ewell,Guildford,Mole Valley,Reigate and Banstead,Runnymede,Spelthorne,Surrey Heath,Tandridge,Waverley,Woking
Sussex Adur,Arun,Brighton and Hove,Chichester,Crawley,Eastbourne,Hastings,Horsham,Lewes,Mid Sussex,Rother,Wealden,Worthing
Thames Valley Aylesbury Vale,Bracknell Forest,Cherwell,Chiltern,Milton Keynes,Oxford,Reading,Slough,South Bucks,South Oxfordshire,Vale of White Horse,West Berkshire,West Oxfordshire,Windsor and Maidenhead,Wokingham,Wycombe
Warwickshire North Warwickshire,Nuneaton and Bedworth,Rugby,Stratford-on-Avon,Warwick
West Mercia Herefordshire, County of,Malvern Hills,Worcester,Wychavon,Telford and Wrekin,Shropshire,Bromsgrove,Redditch,Wyre Forest
West Midlands Birmingham,Coventry,Dudley,Sandwell,Solihull,Walsall,Wolverhampton
West Yorkshire Bradford,Calderdale,Kirklees,Leeds,Wakefield
Wiltshire Swindon,Wiltshire

 

Appendix 2: Code

 

knitr::opts_chunk$set(echo = FALSE)

library(readxl)
library(ggplot2)
library(maps)
library(tidyverse)
library(dplyr)
library(leaflet)
library(viridis)
library(htmltools)
library(sf)
library(rnaturalearth)
library(tidyr)
library(tibble)
library(data.table)
library(plotly)
library(purrr)
library(RSQLite)
library(DBI)
library(knitr)
library(kableExtra)
  
#MAPPING TOP 60 NATIONALITIES IN UK:

nationality_data <- read_excel("C:/Users/sneha/OneDrive/Desktop/Semester 1/Data for Data Scientist/Assessment/Final/data/populationbycountryofbirthandnationalityjul20tojun21.xls", sheet = "2.3" )

#Data Processing
nationality_data <- nationality_data[-c(1:4), ]
#Making first row as header
colnames(nationality_data) <- as.character(unlist(nationality_data[1, ]))
nationality_data <- nationality_data[-1, ]
names(nationality_data)[1] <- "Population Rank"
names(nationality_data)[2] <- "Country"
names(nationality_data)[3] <- "Total Persons"
names(nationality_data)[5] <- "Total Males"
names(nationality_data)[7] <- "Total Females"

nationality_data <- nationality_data %>%
  mutate_at(vars(-2), as.numeric) #For percentages


world_map <- ne_countries(scale = "medium", returnclass = "sf")

merged_data <- merge(world_map, nationality_data, by.x = "admin", by.y = "Country", all.x = TRUE)

# Removing Antartica
filtered_data <- merged_data %>%
  filter(admin != "Antarctica")

global_origin_map <- ggplot(data = filtered_data, aes(fill = `Population Rank`)) +
  geom_sf(size = 0.1, color = "black") +
  scale_fill_viridis_c(
    option = "plasma",
    direction = 1,
    na.value = "white",
    trans = "reverse",
    name = expression(bold("Country Ranking by Total Immigrants (1 = Highest)")),  # Legend title
    breaks = c(1, 10, 20, 30, 40, 50, 60),  # Specific breaks
  ) +
  labs(
    title = expression(bold("Top 60 Nationalities: Global Origins of Residents in the United Kingdom (2020-2021)")),
    caption = "Source:Annual Population Survey(2021), Office for National Statistics UK"
  ) +
  theme(
    panel.grid = element_blank(),
    plot.title = element_text(hjust = 0.5),
    plot.caption = element_text(size = 6),
    panel.background = element_blank(),
    legend.position = "bottom",
    legend.direction = "horizontal",
    legend.box = "horizontal",  
    legend.key.height = unit(0.3, "cm"),
    legend.key.width = unit(2, "cm"),
    legend.title = element_text(size = 8)  # Adjust legend title size
  ) +
  guides(fill = guide_colorbar(title.position = "top", title.vjust = 2, label.vjust = 0.5))
global_origin_map

  
#REGION WISE POPULATION:

uk_nationality_data <- read_excel("C:/Users/sneha/OneDrive/Desktop/Semester 1/Data for Data Scientist/Assessment/Final/data/populationbycountryofbirthandnationalityjul20tojun21.xls", sheet = "2.1" )

#Data Processing
uk_nationality_data <- uk_nationality_data[-c(1:3), ]
colnames(uk_nationality_data) <- as.character(unlist(uk_nationality_data[1, ]))
uk_nationality_data <- uk_nationality_data[-1, ]
names(uk_nationality_data)[4]<- "Total_Population"
names(uk_nationality_data)[6] <- "UK_Nationals"
names(uk_nationality_data)[10] <-"European_Union_Nationals"
names(uk_nationality_data)[26] <- "ME_CAsia_Nationals"
names(uk_nationality_data)[28] <- "East_Asian_Nationals"
names(uk_nationality_data)[30] <- "South_Asian_Nationals"
names(uk_nationality_data)[32] <- "SEast_Asian_Nationals"
names(uk_nationality_data)[36] <- "SubSahara_African_Nationals"
names(uk_nationality_data)[38] <- "North_African_Nationals"
names(uk_nationality_data)[40] <- "North_American_Nationals"
names(uk_nationality_data)[42] <- "CS_American_Nationals"
names(uk_nationality_data)[44] <- "Oceania_Nationals"

#Converting to numeric for percentage calulations
columns_to_convert <- setdiff(seq_along(uk_nationality_data), c(1, 2, 3, 5))  
uk_nationality_data[columns_to_convert] <- lapply(uk_nationality_data[columns_to_convert], function(x) {
  numeric_values <- suppressWarnings(as.numeric(as.character(x)))
  ifelse(is.na(numeric_values), NA, numeric_values)
})

uk_regions<- uk_nationality_data %>%
  filter(Geography == 'Country') %>% 
  select("Area Code", Name, Geography,Total_Population, UK_Nationals, European_Union_Nationals,
         ME_CAsia_Nationals, East_Asian_Nationals, South_Asian_Nationals,
         SEast_Asian_Nationals, SubSahara_African_Nationals,
         North_African_Nationals, North_American_Nationals,
         CS_American_Nationals, Oceania_Nationals)
uk_regions<-uk_regions[-1,] #Removing Totals

uk_regions$Name <- tolower(uk_regions$Name)
uk_regions$Name <- sapply(strsplit(uk_regions$Name, " "), function(words) {
  paste(toupper(substring(words, 1, 1)), substring(words, 2), sep = "", collapse = " ")
})

# Calculating Percentages
columns_to_divide <- colnames(uk_regions)[5:length(colnames(uk_regions))]
uk_regions <- uk_regions %>%
  mutate(across(all_of(columns_to_divide), ~ ./ (Total_Population)*100))

#Mapping
#To suppress the message generated from sf package in R markdown
read_geojson <- function(file_path) {
  result <- capture.output({
    data <- sf::st_read(file_path)
  })
  
  # Filter out the undesired messages
  filtered_result <- result[!grepl("Reading layer|Simple feature collection", result)]
  
  invisible(data)
}
uk_boundary_geojson <- read_geojson("C:/Users/sneha/OneDrive/Desktop/Semester 1/Data for Data Scientist/Assessment/Final/data/united_kingdom_administrative_boundaries_province_polygon.geojson")


#Name Matching 
name_mapping <- tribble(
  ~name_geojson, ~Name_regions,
  "Alba / Scotland", "Scotland",
  "Cymru / Wales", "Wales",
  "Northern Ireland / Tuaisceart Éireann", "Northern Ireland",
  "England", "England"
)
combined_data <- uk_boundary_geojson %>%
  left_join(name_mapping, by = c("name" = "name_geojson")) %>%
  left_join(uk_regions, by = c("Name_regions" = "Name"))
#Removing duplicate columns
combined_data <- combined_data[!duplicated(combined_data$Name_regions), ]

uk_pop_map <- leaflet() %>%
  addTiles() %>%
  addPolygons(
    data = combined_data,
    fillColor = "yellow",
    fillOpacity = 0.1,
    label = lapply(1:nrow(combined_data), function(i) {
      html <- paste("<b>Name:</b>", combined_data$name[i],
                    "<br><b>Total Population:</b>", round(combined_data$Total_Population[i], 2),
                    "<br><b>UK Nationals (%):</b>", round(combined_data$UK_Nationals[i], 2),
                    "<br><b>European Union Nationals (%):</b>", round(combined_data$European_Union_Nationals[i], 2),
                    "<br><b>MiddleEast-Central Asia Nationals (%):</b>", round(combined_data$ME_CAsia_Nationals[i], 2),
                    "<br><b>East Asia Nationals (%):</b>", round(combined_data$East_Asian_Nationals[i], 2),
                    "<br><b>South Asia Nationals (%):</b>", round(combined_data$South_Asian_Nationals[i], 2),
                    "<br><b>South East Asia Nationals (%):</b>", round(combined_data$SEast_Asian_Nationals[i], 2),
                    "<br><b>Sub Saharan Africa Nationals (%):</b>", round(combined_data$SubSahara_African_Nationals[i], 2),
                    "<br><b>North Africa Nationals (%):</b>", round(combined_data$North_African_Nationals[i], 2),
                    "<br><b>North America Nationals (%):</b>", round(combined_data$North_American_Nationals[i], 2),
                    "<br><b>Central-South America Nationals (%):</b>", round(combined_data$CS_American_Nationals[i], 2),
                    "<br><b>Oceania Nationals (%):</b>", round(combined_data$Oceania_Nationals[i], 2))
      HTML(html)
    }),
    group = ~Name_regions, 
    layerId = ~Name_regions,
    highlightOptions = highlightOptions(color = "black", weight = 1,
                                        bringToFront = TRUE) 
  ) %>% 
  addLayersControl(
    overlayGroups = unique(combined_data$Name_regions),
    options = layersControlOptions(collapsed = FALSE)
  ) %>%
  addControl(
    html = '<div style="font-weight: bold; font-size: 14px;">Population of the United Kingdom by Country of Birth(2021-2022)</div>',
    position = "topleft") %>%
  addControl(
    html ='<div style="font-size: 8px">Source:Annual Population Survey(2021), Office for National Statistics UK </div>',
    position = "bottomleft"
  )
uk_pop_map  


  
#LOCAL AUTHORITIES: Ethnic Population
eth_datas <- read_csv("C:/Users/sneha/OneDrive/Desktop/Semester 1/Data for Data Scientist/Assessment/Final/data/population-by-ethnicity-and-local-authority-2021 (2).csv")
eth_data <- eth_datas[,c(4,6,8,9)]
eth_data$Ethnic_Percent <- round((eth_data$`Ethnic Population` / eth_data$`Local Authority Population`) * 100, 2)
# Filter out rows for whole of England and Wales
eth_data <- eth_data[!grepl("^All", eth_data$Geography), ]
eth_data <- eth_data[!grepl("^All", eth_data$Ethnicity), ]

uk_local_shapefile <- read_geojson("C:/Users/sneha/OneDrive/Desktop/Semester 1/Data for Data Scientist/Assessment/Final/data/LAD_May_2020_Boundaries_UK_BFE_2022_4839426458879395509.geojson")
map_local_data <- merge(uk_local_shapefile, eth_data, by.x = "lad20nm", by.y = "Geography")
# Converting to sf object
loc_pop_sf <- st_as_sf(map_local_data)
loc_pop_sf <- st_transform(loc_pop_sf, 4326)
# Convert to data.table for faster processing
setDT(loc_pop_sf)
loc_auth_pop <- loc_pop_sf[, .(Counts = paste(Ethnicity, "-", Ethnic_Percent,"%", collapse = "<br>")), by = .(lad20nm, long, lat)]
loc_auth_pop_map <- leaflet(data = loc_auth_pop) %>%
  addProviderTiles("CartoDB.Positron") %>%
  setView(lng = mean(loc_auth_pop$long, na.rm = TRUE), 
          lat = mean(loc_auth_pop$lat, na.rm = TRUE), 
          zoom = 5) %>%
  addCircleMarkers(
    lng = ~long,
    lat = ~lat,
    color = "red",
    radius = 0.2,
    stroke = FALSE,
    fillOpacity = 0.7,
    popup = ~paste("<b>Local Authority:</b>", lad20nm, "<br>",
                   "<b>Percentage in Local Population:</b><br>", Counts)
  )%>%
  addControl(
    html = "<h3 style='font-weight: bold; text-align: center; font-size: 14px; background-color: transparent;''>Ethnicity Distribution Across Local Authorities in England and Wales </h3>",
    position = "topleft"
  )%>%
  addControl(
    html ='<div style="font-size: 8px">Source:Census(2021), Office for National Statistics UK</div>',
    position = "bottomleft"
  )

  
#Police Mapping
uk_police_geojson <- read_geojson("C:/Users/sneha/OneDrive/Desktop/Semester 1/Data for Data Scientist/Assessment/Final/data/Police_Force_Areas_December_2022_EW_BUC_-6731240188049729726.geojson")
ggplot() +
  geom_sf(data = uk_police_geojson, fill = "transparent", color = "blue") +
  geom_sf_label(data = uk_police_geojson, aes(label = PFA22NM), size = 1.75,) +
  theme_void() +
  labs(title = "Police Force Areas: England and Wales")

#Police Force Area- Their Local Authorities Jurisdiction
police_local_juris <- read_csv("C:/Users/sneha/OneDrive/Desktop/Semester 1/Data for Data Scientist/Assessment/Final/data/Local_Authority_District_to_Community_Safety_Partnerships_to_Police_Force_Areas_(December_2018)_Lookup_in_England_and_Wales.csv")
names(police_local_juris)[2] <- "Local_Authority"
names(police_local_juris)[6] <- "Police_Force"
police_locals_juris<- police_local_juris[, c(2, 6)]
police_locals_juris_areas <- police_locals_juris %>%
  group_by(Police_Force) %>%
  summarize(Areas = paste(Local_Authority, collapse = ","))
police_local_juris_table <- data.frame(Police_Force = police_locals_juris_areas$Police_Force, Areas = police_locals_juris_areas$Areas)



  

# Merging the data based on Local_Authority and Geography columns
eth_police_datas <- eth_data %>%
  left_join(police_local_juris, by = c("Geography" = "Local_Authority"))
eth_police_data <- eth_police_datas[,c(1,2,3,4,10)]
#Areas left out: Finding Missing Values
na_indices <- which(is.na(eth_police_data$Police_Force))
geography_names_na <- eth_police_data$Geography[na_indices]


eth_police_data <- eth_police_data %>%
  mutate(
    Police_Force = case_when(
      Geography == "West Northamptonshire" ~ "Northamptonshire",
      Geography == "Buckinghamshire" ~ "Thames Valley",
      Geography == "Dorset" ~ "Dorset",
      Geography == "Bournemouth, Christchurch and Poole" ~ "Dorset",
      Geography == "North Northamptonshire" ~ "Northamptonshire",
      Geography %in% c("West Suffolk", "East Suffolk") ~ "Suffolk",
      Geography == "Somerset West and Taunton" ~ "Avon and Somerset",
      TRUE ~ Police_Force  
    )
  )
#Ethnicity data based on police territorial area
eth_police_areas <- eth_police_data %>%
  group_by(Police_Force, Ethnicity) %>%
  summarize(
    Ethnic_Population = sum(`Ethnic Population`),
    Local_Authority_Population = sum(`Local Authority Population`),
    .groups = "drop"
  )
eth_police_areas$Ethnicity_Percentage = round((eth_police_areas$Ethnic_Population/ eth_police_areas$Local_Authority_Population) * 100, 2)


map_pol_data <- merge(uk_police_geojson, eth_police_areas, by.x = "PFA22NM", by.y = "Police_Force")
# Converting to sf object
loc_pol_sf <- st_as_sf(map_pol_data)
loc_pol_sf <- st_transform(loc_pol_sf, 4326)
# Convert to data.table for faster processing
setDT(loc_pol_sf)
loc_pol_eth_pop <- loc_pol_sf[, .(Eth_Count = paste(Ethnicity, "-", Ethnicity_Percentage,"%", collapse = "<br>")), by = .(PFA22NM, LONG, LAT)]
loc_pol_eth_map <- leaflet(data = loc_pol_eth_pop) %>%
  addProviderTiles("CartoDB.Positron") %>%
  setView(lng = mean(loc_pol_eth_pop$LONG, na.rm = TRUE), 
          lat = mean(loc_pol_eth_pop$LAT, na.rm = TRUE), 
          zoom = 5) %>%
  addCircleMarkers(
    lng = ~LONG,
    lat = ~LAT,
    color = "purple",
    radius = 5,
    stroke = FALSE,
    fillOpacity = 0.7,
    popup = ~paste("<b>Policing Area:</b>", PFA22NM, "<br>",
                   "<b>Ethnicity in Local Population:</b><br>", Eth_Count)
  )%>%
  addControl(
    html = "<h3 style='font-weight: bold; text-align: center; font-size: 14px; background-color: transparent;''>Ethnicities as proportion of total population in each police jurisdiction in England and Wales </h3>",
    position = "topleft"
  )%>%
  addControl(
    html ='<div style="font-size: 8px">Source:Census(2021) and ONS Open Geography Portal, Office for National Statistics UK</div>',
    position = "bottomleft"
  )
loc_pol_eth_map
#NON ETHNIC POPULATION- POLICE OFFICERS

police_areas_eth <- read_csv("C:/Users/sneha/OneDrive/Desktop/Semester 1/Data for Data Scientist/Assessment/Final/data/by-ethnicity-and-area-police-officers-table.csv")
#Data Cleaning
#Removing BTP
police_areas_eth <- police_areas_eth[-5, ]
#Adding ethnicity to column names
police_areas_eth[1, 5:7] <- lapply(police_areas_eth[1, 5:7], function(x) paste("Asian",x, sep = "_"))
police_areas_eth[1, 8:10] <- lapply(police_areas_eth[1, 8:10], function(x) paste("Black",x, sep = "_"))
police_areas_eth[1, 11:13] <- lapply(police_areas_eth[1, 11:13], function(x) paste("Mixed",x, sep = "_"))
police_areas_eth[1, 14:16] <- lapply(police_areas_eth[1, 14:16], function(x) paste("White",x, sep = "_"))
police_areas_eth[1, 17:19] <- lapply(police_areas_eth[1, 17:19], function(x) paste("Other",x, sep = "_"))
police_areas_eth[1, 20:22] <- lapply(police_areas_eth[1, 20:22], function(x) paste("Unknown",x, sep = "_"))
# Changing col names
police_areas_eth[1, ] <- lapply(police_areas_eth[1, ], function(x) gsub("% of", "", x))
colnames(police_areas_eth) <- as.character(unlist(police_areas_eth[1, ]))
police_areas_eth <- police_areas_eth[-1, ]
police_areas_eth <- police_areas_eth %>%
  mutate(Geography = ifelse(Geography == "Devon and Cornwall", "Devon & Cornwall", Geography))
#Obtaining Population of Local Authority
result <- left_join(
  police_areas_eth,
  eth_police_areas %>% select(Police_Force, Local_Authority_Population),
  by = c("Geography" = "Police_Force")
)
police_ethnic_juris<- unique(result)
# Columns to convert to numeric and remove commas
cols_to_convert <- names(police_ethnic_juris)[2:ncol(police_ethnic_juris)]
for (col in cols_to_convert) {
  police_ethnic_juris[[col]] <- as.numeric(gsub(",", "", as.character(police_ethnic_juris[[col]])))
}
#Calculating Police-Population Ratio
police_ethnic_juris$prop_police <- round((police_ethnic_juris$`Number of police officers (FTE)` / police_ethnic_juris$Local_Authority_Population) * 100, 2)
#Calculating non-white ethnic population
police_ethnic_juris$non_white_ethnicity_percent <- rowSums(
  police_ethnic_juris[, c("Asian_ population", "Black_ population", "Mixed_ population", "Other_ population")],
  na.rm = TRUE
)
police_ethnic_juris$non_white_ethnicity_population <- police_ethnic_juris$non_white_ethnicity_percent * police_ethnic_juris$Local_Authority_Population
#Removing Total statistics for England+Wales
police_ethnic_juris <- police_ethnic_juris[-1, ]

#Calculating Correlation: Non-White Ethnic Population and Police Officers
correlation <- cor(police_ethnic_juris$non_white_ethnicity_population, police_ethnic_juris$`Number of police officers (FTE)`)
cor_text <- paste("Correlation:", round(correlation, 2))
#Plotting
corr_police_eth <- ggplot(police_ethnic_juris, aes(x = non_white_ethnicity_population / 1e+09, y = `Number of police officers (FTE)`)) +
  geom_point() +
  labs(x = "\n Non-White Ethnic Population (in billion)", y = "Number of Police Officers \n", caption="Source: Police Wokforce,England and Wales(2021)- Home Office UK \n Census(2021)-Office for National Statistics UK") +
  ggtitle("Correlation between Police Deployment and Non-White Ethnic Population \n") +
  geom_label(aes(label = cor_text), x = Inf, y = Inf, hjust = 3, vjust = 5, size = 4, color = "red", show.legend = FALSE)+
  theme(plot.title = element_text(hjust = 0.5, size=14),
    plot.caption = element_text(size = 6))
# POLICE ETHNICITY PLOT

police_columns <- c("Asian_ police officers", 
                    "Black_ police officers", 
                    "Mixed_ police officers", 
                    "White_ police officers",
                    "Other_ police officers")

data_plot_police <- police_ethnic_juris %>%
  select(Geography, all_of(police_columns))

data_melted_police <- data_plot_police %>%
  pivot_longer(cols = -Geography, names_to = "Ethnicity", values_to = "Count")

police_ethnicity_plot <- ggplot(data_melted_police, aes(x = "", y = Count, fill = Ethnicity)) +
  geom_bar(stat = "identity", width = 1) +
  facet_wrap(~Geography, nrow = 7) +
  coord_polar("y", start = 0) +
  labs(x = NULL, y = NULL,
       caption = "Source: Police Workforce, England and Wales (2021) - Home Office UK and Office for National Statistics UK") +
  ggtitle("Ethnic Composition of Police Officers Across Police Force Areas \n") +
  scale_fill_brewer(palette = "Set1", labels = c("Asian", "Black", "Mixed", "White", "Other")) +
  theme_void() +
  theme(legend.position = "right",
        legend.key.size = unit(0.2, "cm"),
        legend.text = element_text(size = 7),
        legend.title = element_text(size = 8),
        legend.box.margin = margin(0.5, 0.5, 0.5, 0.5, "cm"),
        plot.margin = margin(2, 2, 2, 2, "cm"),
        strip.text = element_text(size = 7, hjust = 0.5),
        plot.title = element_text(hjust = 0.5, size = 14),
        plot.caption = element_text(size = 6),
        legend.box.background = element_rect(color = "black", linetype = "solid", size = 1))

police_ethnicity_plot


total_eth_ew <- eth_datas %>% 
  filter(Geography == "All - England And Wales") %>% 
  filter(Ethnicity=="Asian"|Ethnicity=="Black"|Ethnicity=="Mixed"|Ethnicity=="White"|Ethnicity=="Other")

total_eth_ew <- total_eth_ew[, c(4, 6, 8, 9)]

fte_columns <- grep("(FTE)", names(police_ethnic_juris), value = TRUE)

sum_of_fte <- lapply(fte_columns, function(col_name) {
  sum_value <- sum(police_ethnic_juris[[col_name]], na.rm = TRUE)
  return(data.frame(Officer_Ethnicity = col_name, Total_Officers = sum_value))
})

sum_of_fte_df <- do.call(rbind, sum_of_fte)
total_officers_byeth <- sum_of_fte_df[-1,]
total_officers_byeth$Officer_Ethnicity <- str_replace(total_officers_byeth$Officer_Ethnicity, "_.*", "")

ew_eth_officers<-  merge(total_eth_ew, total_officers_byeth, by.x = "Ethnicity", by.y = "Officer_Ethnicity", all = TRUE)

ew_eth_officers <- ew_eth_officers %>% 
  filter(Ethnicity!= "Unknown")

ew_eth_officers$Repres <- (ew_eth_officers$Total_Officers/ew_eth_officers$`Ethnic Population`)*1000

ggplot(ew_eth_officers, aes(x = Ethnicity, y = Repres, fill = Ethnicity)) +
  geom_bar(stat = "identity", color = "black") +
  geom_text(aes(label = round(Repres, 2)), vjust = -0.5, size = 3) + # Displaying value labels
  labs(
    x = "\n Ethnicity",
    y = "Representation (per 1000 individuals) \n",
    title = "Police Representation by Ethnicity \n",
    caption = "Source: Police Workforce, England and Wales (2021) - Home Office UK \nCensus (2021) - Office for National Statistics UK"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5, size = 14),
    plot.caption = element_text(size = 6),
    legend.position = "none" # Hides the legend if not needed
  ) +
  scale_fill_brewer(palette = "Set1")
##STOP AND SEARCH DATA

#Unzipping the zipped folder to extract data
folder_path <- "C:/Users/sneha/OneDrive/Desktop/Semester 1/Data for Data Scientist/Assessment/Final/data/a5b42e3ac24aa7ae5e0f355749cf7f837162c43a.zip"
unzip("C:/Users/sneha/OneDrive/Desktop/Semester 1//Data for Data Scientist/Assessment/Final/data/a5b42e3ac24aa7ae5e0f355749cf7f837162c43a.zip", exdir = "unzipped_folder")
unzip_file_list <- list.files(path = "unzipped_folder", pattern = "\\-stop-and-search.csv$", recursive = TRUE, full.names = TRUE)

#Creating a single table with column shpwing area of incidences
ssd <- unzip_file_list %>%
  map_df(~ {
    area_name <- str_extract(.x, "(?<=unzipped_folder/\\d{4}-\\d{2}/\\d{4}-\\d{2}-)\\w+(-\\w+)*(?=-stop-and-search.csv)")  # Extract area name
    read_csv(.x) %>%
      mutate(area = area_name)
  })

#Data Cleaning
ssd$area <- gsub("-", " ", ssd$area)
ssd <- ssd %>% filter(area != "btp")
ssd <- ssd %>% arrange(area)
ssd$ExtractedEthnicity <- ifelse(
  is.na(ssd$`Self-defined ethnicity`),
  NA,
  str_extract(ssd$`Self-defined ethnicity`, "(.*?)(?= -|$)")
)

names(ssd)[8]<- "Age"
names(ssd)[9]<- "Suspect_Ethnicity"
names(ssd)[10]<- "Officer_Ethnicity"
names(ssd)[12]<- "Search_Object"
names(ssd)[14]<- "Outcome_Search_Object"
names(ssd)[15]<- "Additional_Search"
names(ssd)[17] <- "Suspect_Major_Ethnicity"
ssd <- ssd %>%
  mutate(Suspect_Major_Ethnicity = case_when(
    Suspect_Major_Ethnicity == "White" ~ "White",
    Suspect_Major_Ethnicity == "Asian/Asian British" ~ "Asian",
    Suspect_Major_Ethnicity == "Black/African/Caribbean/Black British" ~ "Black",
    Suspect_Major_Ethnicity == "Mixed/Multiple ethnic groups" ~ "Mixed",
    Suspect_Major_Ethnicity == "Other ethnic group" ~ "Other",
    is.na(Suspect_Major_Ethnicity) ~ "NA", 
    TRUE ~ as.character(Suspect_Major_Ethnicity)
  ))
stop_search <- dbConnect(SQLite(), "stop_search.db")
dbWriteTable(stop_search, "ssd", ssd, overwrite = TRUE)
#Area and Incidences per Ethnicity

area_eth_query <- "SELECT Suspect_Major_Ethnicity, area, Count(*)
FROM ssd
WHERE 
  Suspect_Major_Ethnicity != 'NA'
GROUP BY Suspect_Major_Ethnicity, area
ORDER BY area"
area_eth <- dbGetQuery(stop_search, area_eth_query)

#Data Cleaning
area_eth$area <- tolower(area_eth$area)
uk_police_geojson$PFA22NM <- tolower(uk_police_geojson$PFA22NM)
uk_police_geojson <- uk_police_geojson %>%
  mutate(PFA22NM = recode(PFA22NM,
                          "metropolitan police" = "metropolitan",
                          "devon & cornwall" = "devon and cornwall",
                          "dyfed and powys" = "dyfed powys",
                          "london, city of" = "city of london"))

#Getting Local Authority pop for each ethinicty per police area

police_area_eth_pop <- eth_police_areas %>%
  filter(Ethnicity=="Asian"|Ethnicity=="Black"|Ethnicity=="Mixed"|Ethnicity=="White"|Ethnicity=="Other")
police_area_eth_pop$Police_Force <- tolower(police_area_eth_pop$Police_Force)
police_area_eth_pop <- police_area_eth_pop %>%
  mutate(Police_Force = recode(Police_Force,
                          "metropolitan police" = "metropolitan",
                          "devon & cornwall" = "devon and cornwall",
                          "dyfed-powys" = "dyfed powys",
                          "london, city of" = "city of london"))

map_area_eth_data <- merge(uk_police_geojson, area_eth, by.x = "PFA22NM", by.y = "area")
map_area_eth_data <- merge(police_area_eth_pop, map_area_eth_data,
                     by.x = c("Police_Force", "Ethnicity"),
                     by.y = c("PFA22NM", "Suspect_Major_Ethnicity"),
                     all = TRUE)
map_area_eth_data <- map_area_eth_data[!is.na(map_area_eth_data$LONG) & !is.na(map_area_eth_data$LAT), ]
sf_map_area_eth_data <- st_as_sf(map_area_eth_data, coords = c("LONG", "LAT"), crs = 4326)

# Converting to sf object
sf_area_eth <- st_as_sf(map_area_eth_data)
sf_area_eth <- st_transform(sf_area_eth, 4326)
# Convert to data.table for faster processing
setDT(sf_area_eth)
total_cases_area_eth <- sf_area_eth %>%
  group_by(Police_Force) %>%
  summarise(total_cases = sum(`Count(*)`))

sf_area_eth <- sf_area_eth %>%
  mutate(Percentage = (`Count(*)` / Ethnic_Population) * 100)

area_eth_pop <- sf_area_eth[, .(Ethnicity = paste(Ethnicity, "-", round(Percentage,2)," %", collapse = "<br>")), by = .(Police_Force, LONG, LAT)]
area_eth_pop_map <- leaflet(data = area_eth_pop) %>%
  addProviderTiles("CartoDB.Positron") %>%
  setView(lng = mean(sf_area_eth$LONG, na.rm = TRUE), 
          lat = mean(sf_area_eth$LAT, na.rm = TRUE), 
          zoom = 5) %>%
  addCircleMarkers(
    lng = ~LONG,
    lat = ~LAT,
    color = "purple",
    radius = 5,
    stroke = FALSE,
    fillOpacity = 0.7,
    popup = ~paste("<b>Policing Area:</b>", toupper(Police_Force), "<br>",
                   "<b>Cases per Ethnicity:</b><br>", Ethnicity)
  )%>%
  addControl(
    html = "<h3 style='font-weight: bold, text-align: center; font-size: 14px; background-color: transparent;''> Stop and Search Incidents Relative to Ethnic Population Sizes in England and Wales </h3>",
    position = "topleft"
  )%>%
  addControl(
    html ='<div style="font-size: 8px">Source: Stop and Search Data collection(2021), Home OfficeUK</div>',
    position = "bottomleft"
  )
area_eth_pop_map
# Total Suspects: ETHNICITY

count_suspect <- "
SELECT Suspect_Major_Ethnicity, COUNT(*) as Count,
       COUNT(*) * 100 / (
            SELECT COUNT(*)
            FROM ssd
            WHERE Suspect_Major_Ethnicity != 'NA'
        ) AS Percentage
FROM ssd
WHERE Suspect_Major_Ethnicity != 'NA'
GROUP BY Suspect_Major_Ethnicity"

tot_sus <-dbGetQuery(stop_search, count_suspect)
tot_sus_pie <- ggplot(tot_sus, aes(x = "", y = Percentage, fill = Suspect_Major_Ethnicity)) +
  geom_bar(stat = "identity", width = 1, color = "white") +
  coord_polar("y", start = 0) +
  labs(title = "General Overview:Ethnicity Distribution (Absolute) in Stop and Search Incidents",
       caption = "Source: Stop and Search Data Collection (2021)-Home Office UK",
       fill = "Suspect Major Ethnicity") +
  scale_fill_discrete(name = "Suspect Major Ethnicity") +
  geom_text(aes(label = paste0(Percentage, "%")), position = position_stack(vjust = 0.5))+
  theme_void()+
  scale_fill_brewer(palette = "Set1")+
  theme(legend.box.background = element_rect(color = "black", linetype = "solid", size = 1),
        plot.title = element_text(size=12),
        plot.caption = element_text(size = 6))
tot_sus_pie

#GENDER PROFILE
gender_query <- "
SELECT 
  Suspect_Major_Ethnicity,
  Gender,
  COUNT(*) as Count,
  COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (PARTITION BY Suspect_Major_Ethnicity) as Percentage
FROM ssd
WHERE 
  Suspect_Major_Ethnicity != 'NA'
  AND Gender != 'NA'
GROUP BY Suspect_Major_Ethnicity, Gender
ORDER BY Suspect_Major_Ethnicity, Gender"

gender_eth <- dbGetQuery(stop_search, gender_query)
ggplot(gender_eth, aes(x = "", y = Percentage, fill = Gender)) +
  geom_bar(stat = "identity", width = 1, color = "white") +
  geom_text(aes(label = paste0(Gender,": ", round(Percentage, 2), "%"), 
                x = ifelse(Gender == "Other", 1.5, ifelse(Gender == "Female", 1, 0.5))), 
            position = position_stack(vjust = 0.5), 
            size = 3, color = "black") +
  coord_polar("y", start = 0) +
  facet_wrap(~Suspect_Major_Ethnicity) +
  labs(title = "General Overview: Gender Distribution Within Ethnic Groups in Stop and Search Incidents \n",
       caption = "Source: Stop and Search Data Collection (2021)-Home Office UK",
       fill = "Gender",
       y = "Percentage") +
  theme_void() +
  theme(legend.position = "right",
        legend.box.background = element_rect(color = "black", linetype = "solid", size = 1),
        plot.title = element_text(size=12),
        plot.caption = element_text(size = 6))+
  scale_fill_brewer(palette = "Accent")



suspect_by_eth <- merge(total_eth_ew, tot_sus, by.x = "Ethnicity", by.y = "Suspect_Major_Ethnicity")
suspect_by_eth$Represent <- (suspect_by_eth$Count/suspect_by_eth$`Ethnic Population`)*1000
ggplot(suspect_by_eth, aes(x = Ethnicity, y = Represent, fill = Ethnicity)) +
  geom_bar(stat = "identity", position = "dodge", color = "black") +
  geom_text(aes(label = paste0(round(Represent, 2))), 
            position = position_dodge(width = 0.9), vjust = -0.5, size = 3)+
  labs(
    x = "\n Ethnicity",
    y = "Stop and Search Rates (per 1000 individuals) \n ",
    title = "Stop and Search Rates per 1,000 Population of Each Respective Ethnicity \n",
    caption= "Source: Stop and Search Data Collection (2021)-Home Office UK \n Census(2021)- Office for National Statistics UK"
  ) +
  scale_fill_brewer(palette = "Set1") +
  theme(axis.text.x = element_text(hjust = 0.5),
    legend.box.background = element_rect(color = "black", linetype = "solid", size = 1),
    legend.title = element_text(size = 14),
    axis.title.x = element_text(size = 11),
    axis.title.y = element_text(size = 11),
    plot.title = element_text(size=14),
    plot.caption = element_text(size = 6)
  )
#Ethnicity of Officer Vs Ethnicity of Suspect

query_eth <- "
WITH EthnicityCounts AS (
  SELECT Officer_Ethnicity, SUM(Count) AS TotalCount
  FROM (
    SELECT Officer_Ethnicity, COUNT(*) AS Count
    FROM ssd
    WHERE Officer_Ethnicity IS NOT NULL 
      AND Suspect_Major_Ethnicity IS NOT NULL 
      AND Suspect_Major_Ethnicity != 'NA'
    GROUP BY Officer_Ethnicity, Suspect_Major_Ethnicity
  ) AS SubCounts
  GROUP BY Officer_Ethnicity
)

SELECT s.Officer_Ethnicity, s.Suspect_Major_Ethnicity, 
       s.Count * 1.0 / t.TotalCount AS Proportion
FROM (
  SELECT Officer_Ethnicity, Suspect_Major_Ethnicity, COUNT(*) AS Count
  FROM ssd
  WHERE Officer_Ethnicity IS NOT NULL 
    AND Suspect_Major_Ethnicity IS NOT NULL 
    AND Suspect_Major_Ethnicity != 'NA'
  GROUP BY Officer_Ethnicity, Suspect_Major_Ethnicity
) AS s
JOIN EthnicityCounts t ON s.Officer_Ethnicity = t.Officer_Ethnicity
"

sus_off_eth <- dbGetQuery(stop_search, query_eth)

sus_off_eth_map <- ggplot(sus_off_eth, aes(x = Officer_Ethnicity, y = Proportion, fill = Suspect_Major_Ethnicity)) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(title = "Ethnic Breakdown: Proportion of Suspects Stopped by Officers of Varied Ethnicities \n",
       x = "\n Officer Ethnicity",
       y = "Proportion \n",
       caption= "Source: Stop and Search Data Collection (2021)-Home Office UK") +
  theme_minimal() +
  theme(axis.text.x = element_text(hjust = 0.5),
    legend.box.background = element_rect(color = "black", linetype = "solid", size = 1),
    legend.title = element_text(size = 14),
    axis.title.x = element_text(size = 11),
    axis.title.y = element_text(size = 11),
    plot.title = element_text(size=14),
    plot.caption = element_text(size = 6)
  ) +
  scale_fill_brewer(palette = "Set1", name = "Suspects Ethnicity")
sus_off_eth_map

# Object-Ethnicity Association
obj_eth <- "SELECT 
    t.Search_Object,
    t.Suspect_Major_Ethnicity,
    t.Count,
    t.Count * 100 / total.TotalCount AS Proportion
FROM (
    SELECT 
        Search_Object,
        Suspect_Major_Ethnicity,
        COUNT(*) AS Count
    FROM ssd 
    WHERE Search_Object IN ('Controlled drugs', 'Offensive weapons', 'Psychoactive substances', 'Evidence of wildlife offences')
      AND Suspect_Major_Ethnicity != 'NA'
    GROUP BY Search_Object, Suspect_Major_Ethnicity
) AS t
JOIN (
    SELECT 
        Search_Object,
        COUNT(*) AS TotalCount
    FROM ssd 
    WHERE Search_Object IN ('Controlled drugs', 'Offensive weapons', 'Psychoactive substances', 'Evidence of wildlife offences')
      AND Suspect_Major_Ethnicity != 'NA'
    GROUP BY Search_Object
) AS total ON t.Search_Object = total.Search_Object"

obj_eth_table <- dbGetQuery(stop_search, obj_eth)

object_eth <- merge(obj_eth_table,total_eth_ew, by.x="Suspect_Major_Ethnicity", by.y ="Ethnicity")
object_eth$Represent <- (object_eth$Count/object_eth$`Ethnic Population`)*1000

ggplot(object_eth, aes(x = Suspect_Major_Ethnicity, y = Represent, fill = Search_Object)) +
  geom_bar(stat = "identity", position = "dodge") +
  facet_wrap(~Search_Object, scales = "free", labeller = labeller(Search_Object = c(
    "Controlled drugs" = "Controlled Drugs",
    "Evidence of wildlife offences" = "Wildlife Offences",
    "Offensive weapons" = "Offensive Weapons",
    "Psychoactive substances"= "Psychoactive Substances"
  ))) +
  labs(
    x = "\n Suspect's Ethnicity",
    y = "Cases (per 1000 individuals) \n",
    title = "Representation of Ethnicities in Stop and Search Incidents per 1000 Population by Search Object Category \n",
    caption= "Source: Stop and Search Data Collection (2021)-Home Office UK \n Census(2021)- Office for National Statistics UK",
    fill = "Search Object"
  ) +
  theme(axis.text.x = element_text(hjust = 0.5),
    legend.box.background = element_rect(color = "black", linetype = "solid", size = 1),
    legend.title = element_text(size = 14),
    axis.title.x = element_text(size = 11),
    axis.title.y = element_text(size = 11),
    plot.title = element_text(size=12),
    plot.caption = element_text(size = 6)
  )

#OUTCOME-ETHNICITY 
outcome_query <- "
SELECT 
  Suspect_Major_Ethnicity,
  SUM(CASE WHEN Outcome = 'A no further action disposal' THEN 1 ELSE 0 END) AS No_Action_Count,
  SUM(CASE WHEN Outcome = 'Arrest' THEN 1 ELSE 0 END) AS Arrest_Count,
  SUM(CASE WHEN Outcome NOT IN ('A no further action disposal', 'Arrest') THEN 1 ELSE 0 END) AS Warnings_Count,
  ROUND(SUM(CASE WHEN Outcome = 'A no further action disposal' THEN 1 ELSE 0 END) * 100.0 / total.No_Action_Total, 2) AS No_Action_Percentage,
  ROUND(SUM(CASE WHEN Outcome = 'Arrest' THEN 1 ELSE 0 END) * 100.0 / total.Arrest_Total, 2) AS Arrest_Percentage,
  ROUND(SUM(CASE WHEN Outcome NOT IN ('A no further action disposal', 'Arrest') THEN 1 ELSE 0 END) * 100.0 / total.Warnings_Total, 2) AS Warnings_Percentage
FROM ssd
CROSS JOIN (
  SELECT 
    SUM(CASE WHEN Outcome = 'A no further action disposal' THEN 1 ELSE 0 END) AS No_Action_Total,
    SUM(CASE WHEN Outcome = 'Arrest' THEN 1 ELSE 0 END) AS Arrest_Total,
    SUM(CASE WHEN Outcome NOT IN ('A no further action disposal', 'Arrest') THEN 1 ELSE 0 END) AS Warnings_Total
  FROM ssd
  WHERE 
    Outcome IS NOT NULL 
    AND Outcome != 'NA'
    AND Suspect_Major_Ethnicity IS NOT NULL
    AND Suspect_Major_Ethnicity != 'NA'
) AS total
WHERE 
  Outcome IS NOT NULL 
  AND Outcome != 'NA'
  AND Suspect_Major_Ethnicity IS NOT NULL
  AND Suspect_Major_Ethnicity != 'NA'
GROUP BY Suspect_Major_Ethnicity"

outh_eth_table <- dbGetQuery(stop_search, outcome_query)
#Tidy Format
percentages_data <- outh_eth_table %>%
  select(Suspect_Major_Ethnicity, matches("Count"))
outp_eth_final <- percentages_data %>%
  pivot_longer(cols = -Suspect_Major_Ethnicity, names_to = "Percentage_Type", values_to = "Count_Value")
output_ethnicity <- merge(outp_eth_final,total_eth_ew,by.x="Suspect_Major_Ethnicity", by.y="Ethnicity")
output_ethnicity$Represent <- (output_ethnicity$Count_Value/output_ethnicity$`Ethnic Population`)*1000

ggplot(output_ethnicity, aes(x = Suspect_Major_Ethnicity, y = Represent)) +
  geom_bar(stat = "identity", position = "dodge", aes(fill = Suspect_Major_Ethnicity)) +
  geom_text(aes(label = round(Represent, 2)), position = position_dodge(width = 0.9), vjust = -0.5) +
  facet_wrap(~ Percentage_Type, labeller = labeller(Percentage_Type = c(
    "Arrest_Count" = "Arrest",
    "No_Action_Count" = "No Action",
    "Warnings_Count" = "Warning"
  ))) +
  labs(
    x = "\n Suspect Ethnicity",
    y = "Representation (per 1000 individuals) \n",
    title = "Results of Stop-Search Outcomes per 1000 Individuals Across Ethnicities \n",
    fill = "Suspect Ethnicity",
    caption= "Source: Stop and Search Data Collection (2021)-Home Office UK \n Census(2021)- Office for National Statistics UK"
  ) +
  theme(axis.text.x = element_text(hjust = 0.5),
    legend.box.background = element_rect(color = "black", linetype = "solid", size = 1),
    legend.title = element_text(size = 14),
    axis.title.x = element_text(size = 11),
    axis.title.y = element_text(size = 11),
    plot.title = element_text(hjust = 0.5, size=14),
    plot.caption = element_text(size = 6)
  )+
  scale_fill_brewer(palette = "Set1", name = "Suspects Ethnicity")

# CONFIDENCE IN POLICE DATA
conf_data <- read_csv("C:/Users/sneha/OneDrive/Desktop/Semester 1/Data for Data Scientist/Assessment/Final/data/by-ethnicity-over-time-table.csv")

#DATA CLEANING
# Combine the first row elements with existing column names
new_col_names <- paste(conf_data[1,], names(conf_data))
# Assign the new column names to the dataframe
names(conf_data) <- new_col_names
conf_data <- conf_data[-1, ]
clean_col_names <- gsub("\\.{3}[0-9]+", "", colnames(conf_data))
colnames(conf_data) <- clean_col_names
# Identify columns with "%" and get years
percent_cols <- grep("%", names(conf_data))
years <- gsub(".*% (\\d{4}/\\d{2})", "\\1", names(conf_data)[percent_cols])
# Update column names with years
sample_size_cols <- grep("Sample size", names(conf_data))
for (i in 1:length(sample_size_cols)) {
  col_index <- sample_size_cols[i]
  year <- years[i]
  colnames(conf_data)[col_index] <- paste("Sample size", year)
}
# Replace spaces with underscores in column names
colnames(conf_data) <- gsub(" ", "_", colnames(conf_data))
# Creating overall_etnicity column for categorisation: Major Ethnicities
conf_data <- conf_data %>%
  mutate(
    overall_ethnicity = case_when(
      Ethnicity_ == "All" ~ "All",
      between(row_number(), which(Ethnicity_ %in% c("Asian")), which(Ethnicity_ == "Asian other")) ~ "Asian",
      between(row_number(), which(Ethnicity_ %in% c("Black")), which(Ethnicity_ == "Black other")) ~ "Black",
      between(row_number(), which(Ethnicity_ %in% c("Mixed")), which(Ethnicity_ == "Mixed other")) ~ "Mixed",
      between(row_number(), which(Ethnicity_ %in% c("White")), which(Ethnicity_ == "White other")) ~ "White",
      Ethnicity_ == "Other" ~ "Other",
      TRUE ~ NA_character_
    ),
    overall_ethnicity = ifelse(is.na(overall_ethnicity), lag(overall_ethnicity), overall_ethnicity)
  )
#Data Cleaning
# Remove commas in all columns except the first and last
cols_to_clean <- names(conf_data)[-c(1, length(conf_data))]
conf_data[, cols_to_clean] <- lapply(conf_data[, cols_to_clean], function(x) gsub(",", "", x))
# Replace "?" with NA in the entire dataframe
conf_data[conf_data == "?"] <- NA
cols_to_convert <- names(conf_data)[-c(1, ncol(conf_data))]
# Convert selected columns to numeric
conf_data <- conf_data %>%
  mutate(across(all_of(cols_to_convert), as.numeric))

#Data Processing
# Select columns containing "%" and "Sample_size" in their names
cols_to_plot <- grep("%", names(conf_data), value = TRUE)
ethnicities_to_plot <- c("Asian", "Black", "Mixed", "White", "Other")
major_eth_conf_data <- conf_data %>%
  filter(Ethnicity_ %in% ethnicities_to_plot)
# Melt the data for plotting
major_eth_conf_data <- major_eth_conf_data %>%
  select(one_of(cols_to_plot), Ethnicity_) %>%
  pivot_longer(cols = -Ethnicity_, names_to = "Measure", values_to = "Value")

# Plotting: Major Ethnicities Confidence
conf_major_eth <- ggplot(major_eth_conf_data, aes(x = Measure, y = Value, group = Ethnicity_, color = Ethnicity_)) +
  geom_line(size = 1) + 
  geom_point() +
  labs(title = "Police Confidence Trends Across Ethnicities \n",
       x = "\n Time Period",
       y = "% of People \n",
       caption = "\n Source: Crime Survey for England and Wales (2021)",
       color = "Ethnicities") +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5, size = 14),
    axis.text.x = element_text(hjust = 1),
    plot.caption = element_text(size = 6),
    legend.box.background = element_rect(color = "black", linetype = "solid", size = 1),
    legend.title = element_text(size = 14),
    axis.title.x = element_text(size = 11),
    axis.title.y = element_text(size = 11),
    axis.line = element_line(color = "black", size = 0.5)  # Customize axis lines
  ) +
  scale_x_discrete(labels = function(x) gsub("%_", "", x)) +
  scale_color_brewer(palette = "Set1", name = "Ethnicity")

conf_major_eth

#Data Processing for Sub ethnicities within major ethnicities
conf_sub_eth_data <- conf_data[-1,]
conf_sub_eth_data <- conf_sub_eth_data %>%
  filter(Ethnicity_ != overall_ethnicity) %>% 
  filter(Ethnicity_ != "White Gypsy/Traveller") #As % not mentioned

conf_sub_eth_data <- conf_sub_eth_data %>%
  mutate(Ethnicity_ = factor(Ethnicity_, levels = unique(conf_sub_eth_data$Ethnicity_)))

conf_eth_sub_data_long <- conf_sub_eth_data %>%
  pivot_longer(
    cols = -c(Ethnicity_, overall_ethnicity),
    names_to = "Measure",
    values_to = "Value"
  )
conf_eth_sub_data_long <- conf_eth_sub_data_long %>%
  filter(!grepl("Sample", Measure))

conf_sub_eth_plot <- ggplot(conf_eth_sub_data_long, aes(x = Measure, y = Value, fill = overall_ethnicity)) +
  geom_bar(stat = "identity", position = "dodge") +
  geom_text(aes(label = Value), position = position_dodge(width = 0.9), vjust = -0.5,size=2) + 
  facet_wrap(~ Ethnicity_, scales = "free", nrow = 4) +
  labs(title = "Police Confidence Trends Across Major Sub-Ethnicities \n",
       x = "\n Time Period",
       y = "% of People \n",
       caption = "\n Source: Crime Survey for England and Wales (2021)",
       fill = "Major Ethnicities") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5, size=14),
        axis.text.x = element_text(angle = 45, hjust = 1,size=4),
        plot.caption = element_text(size=6),
        legend.box.background = element_rect(color = "black", linetype = "solid", size = 1),
        legend.title = element_text(size = 14),
        axis.title.x = element_text(size = 11),
        axis.title.y = element_text(size = 11)
        ) +
  scale_fill_brewer(palette = "Set1")+
  scale_x_discrete(labels = function(x) gsub("%_", "", x))+
  coord_cartesian(ylim = c(0, 100))

conf_sub_eth_plot


  
kable(police_local_juris_table, format = "html") %>%
  kable_styling(full_width = FALSE) %>%
  column_spec(c(1:2), border_left = TRUE, border_right = TRUE) %>% 
  row_spec(0, bold = TRUE)  

# this chunk generates the complete code appendix. 
# eval=FALSE tells R not to run (``evaluate'') the code here (it was already run before).